Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering

Given a Graph G = (V, E) and two vertices i, j ∈ V, we introduce Confluence(G, i, j), a vertex mesoscopic closeness measure based on short Random walks, which brings together vertices from a same overconnected region of the Graph G, and separates vertices coming from two distinct overconnected regions. Confluence becomes a useful tool for defining a new Clustering quality function QConf(G, Γ) for a given Clustering Γ and for defining a new heuristic Starling to find a partitional Clustering of a Graph G intended to optimize the Clustering quality function QConf. We compare the accuracies of Starling, to the accuracies of three state of the art Graphs Clustering methods: Spectral-Clustering, Louvain, and Infomap. These comparisons are done, on the one hand with artificial Graphs (a) Random Graphs and (b) a classical Graphs Clustering Benchmark, and on the other hand with (c) Terrain-Graphs gathered from real data. We show that with (a), (b) and (c), Starling is always able to obtain equivalent or better accuracies than the three others methods. We show also that with the Benchmark (b), Starling is able to obtain equivalent accuracies and even sometimes better than an Oracle that would only know the expected overconnected regions from the Benchmark, ignoring the concretely constructed edges.


Introduction
Terrain-Graphs are real world Graphs that model data gathered by field work, in diverse fields such as sociology, linguistics, biology, or Graphs from the internet. Most Terrain-Graphs contrast with artificial Graphs (deterministic or Random) and share four similar properties [1-3]. They exhibit: p 1 :. Not many edges : m being O(n.log(n)) (where m is the number of edges and n the number of vertices); p 2 :. Short paths (L, the average number of edges on the shortest path between two vertices is low); p 3 :. A high Clustering rate C ¼ 3 � number of triangles number of connected triplets (many overconnected local subGraphs in a globally sparse Graph); p 4 :. A heavy-tailed degree distribution (the distribution of the degrees of the vertices of the Graph can be approximated by a power law). Clustering a Terrain-Graph consists of grouping together in Modules vertices that belong to the same overconnected region of the Graph (property p 3 ), while keeping separate vertices that do not (property p 1 ). These groups of overconnected vertices form an essential feature of the structures of most Terrain-Graphs. Their detection is central in a wide variety of fields, such as in biology [4], in sociology [5], in linguistics [6] or in computer sciences [7], for many tasks as the grouping of most diverse entities [8][9][10][11][12][13], the pattern detection in data [14], the prediction of links [15], the model training [16], the label assignment [17], the recommender Algorithms [18], the data noise removal [19], or the feature matching [20].
In section 2 we put in context in the state of the art, the methods with which we compare our results: in section 2.1 we present the Spectral-Clustering, one of the most popular and efficient Graph Clustering methods, in section 2.2.1 Louvain, one of the most used Graph Clustering method optimizing Modularity the most popular Graph Clustering quality function, and in section 2.2.2 Infomap, one of the most efficient Graph Clustering method optimizing the most elegant Graph Clustering quality function.
In section 3 we present the Confluence, a vertex mesoscopic closeness measure and a new Clustering quality function Q Conf based on the Confluence. In section 4 we compare optimality for Modulatity and optimality for Q Conf . In section 5 we propose to consider a clustering method, as Binary Edge-Classifier By nodes Blocks (BECBB) trying to classify each pairs of vertices into two classes: the edges and the non-edges. In section 6 we propose a heuristic Starling for optimizing the objective function Q Conf .
In section 7, we compare the accuracies as BECBB, of Starling, Louvain, Infomap and Spectral-Clustering. These comparisons are done, on the one hand with artificial Graphs (a) Random Graphs and (b) a classical Graphs Clustering Benchmark, and on the other hand with (c) Terrain-Graphs gathered from real data. We show that with (a), (b) and (c), Starling is always able to obtain equivalent or better accuracies than the three others methods. We show also that with the Benchmark (b), Starling is able to obtain equivalent accuracies and even sometimes better than an Oracle that would only know the expected overconnected regions from the Benchmark, ignoring the concretely constructed edges that are to be predicted by the Oracle as BECBB.
In section 8 we discuss the choice of parameters, and conclude in section 9.

Previous work
The literature on Graph Clustering is too extensive for a comprehensive review here. We concentrate on placing in the state of art, the methods to which we compare our results. Let G = (V, E) be a Graph with n = |V| vertices and m = |E| edges.
The degree of a vertex i in G is d G (i) = |{j 2 V/{i, j} 2 E}|; Module: A Module γ of G is a non-empty subset of the Graph's vertices: γ 6 ¼ ⌀ and γ � V; Clustering: A Clustering Γ of G is a set of Modules of G such that S γ2Γ γ = V;

Spectral Graph Clustering
Spectral Graph Clustering is one of the most popular and efficient Graph Clustering Algorithms. It generally use the classical kmeans Algorithm whose original idea was proposed by Hugo Steinhaus [21]. Spectral Graph Clustering Algorithms work as follows (see [22]): Algorithm 1 SGC: Spectral Graph Clustering 0 otherwise: is the identity matrix 2 R n�n ). (4) Compute the first κ eigenvectors u 1 , . . ., u k of L (see [23]). (5) Let U 2 R n�k be the matrix containing the vectors u 1 , . . ., u κ as columns. (6) For i = 1, . . ., n, let y i 2 R k be the vector corresponding to the i-th row of U. (7) Cluster the points ðy i Þ i¼1;...;n 2 R k with the k-means Algorithm into κ clusters C 1 , . . ., C k .
We can notice that for Spectral Graph Clustering in Algorithm 1, we need to know κ the number of groups of vertices in advance in the Input. It is an advantage because it makes it possible to have a handle on the desired number of Modules, but how to choose κ when one does not know the structure of the Graph? The choice of the number κ of groups is fundamental, it is not a simple problem (see [23][24][25][26][27][28][29][30][31]), and the quality of the results varies greatly depending on κ, what we confirm in section 7.2.1 with Figs 7 and 8.

When we don't know the number of groups in advance
Let G = (V, E) be a Graph and Γ a Partitional Clustering of its vertices.
Clustering quality function: A Clustering quality function Q(G, Γ) is an R-valued function designed to measure the adequacy of the Modules with the overconnected regions of Terrain-Graphs (property p 3 ).
When we don't know κ the number of groups of vertices in advance, given a Clustering quality function Q, in order to establish a good Partitional Clustering for a Graph G = (V, E), it would be sufficient to build all the possible partitionings of the set of vertices V, and to pick a partitioning Γ such that Q(G, Γ) is optimal. This method is however obviously concretely impractical, since the number of partionings of a set of size n = |V| is equal to the n th Bell number, a sequence known to grow exponentially [32]. Many Graph Clustering methods therefore consist in defining a heuristic that can find in a reasonable amount of time a Clustering Γ that tentatively optimises Q(G, Γ) for a given Clustering quality function Q.
With methods optimizing a quality function Q, we do not need to know κ the number of vertices groups in advance in the input, because κ is then a direct consequence of the quality function Q: κ will be automatically built by the optimisation of Q.
2.2.1 Louvain. The Louvain method proposed in 2008 by Blondel, Guillaume, Lambiotte, and Lefebvre in [33] is a heuristic for tentatively maximizing the quality function Modularity proposed in 2004 by Newman and Girvan [34]. The modularity of a Partitional Clustering for a Graph G = (V, E) with m = |E| edges is equal to the difference between the proportion of links internal to Modules of the Clustering, and the same quantity expected in a null model, where no community structure is expected. The null model is a Random Graph G Null with the same number of vertices and edges, as well as the same distribution of degrees as G, where the probability of having an edge between two vertices x and y is equal to d G ðxÞ:d G ðyÞ 2m . Let G = (V, E) be a Graph with m edges and Γ a partitioning of V. The modularity of Γ can be defined as follows. The definition of modularity given by Newman and Girvan in [34], is equivalent to that we propose here in Formula 1: Where P edge (G, x, y) is a symmetrical vertex closeness measure equal to the probability of {x, y} being an edge of G, that is: In Eq 1, the first term 1 2m is purely conventional, so that the modularity values all live in the [−1, 1] interval, but plays no role when maximizing modularity, since it is constant for a given Graph G.
We then define Q P edge as Newman and Girvan's quality function, to be maximized: For Louvain, a good Partitional Clustering Γ as per 5 is one that groups in the same Module vertices that are linked (especially ones with low degrees, but also to a lesser extent ones with high degrees), while avoiding as much as possible the grouping of non-linked vertices (especially ones with high degrees, but to a lesser extent ones with low degrees). However, several authors [35,36] showed that optimizing Modularity leads to merging small Modules into larger ones, even when those small Modules are well defined and weakly connected to one another. To address this problem, some authors [37,38] defined multiresolution variants of Modularity, adding a resolution parameter to control the size of the Modules.
For instance [37] introduces a parameter λ 2 R in Eq 5: where λ is a resolution parameter: the higher the resolution λ, the smaller the Modules get.
Nevertheless, in [39], the authors show that ". . . multiresolution Modularity suffers from two opposite coexisting problems: the tendency to merge small subGraphs, which dominates when the resolution is low; the tendency to split large subGraphs, which dominates when the resolution is high. In benchmark networks with heterogeneous distributions of cluster sizes, the simultaneous elimination of both biases is not possible and multiresolution Modularity is not capable to recover the planted community structure, not even when it is pronounced and easily detectable by other methods, for any value of the resolution parameter. This holds for other multiresolution techniques and it is likely to be a general problem of methods based on global optimization.
[. . .] real networks are characterized by the coexistence of clusters of very different sizes, whose distributions are quite well described by power laws [40,41]. Therefore there is no characteristic cluster size and tuning a resolution parameter may not help." The Louvain method https://github.com/10XGenomics/louvain is non-deterministic, i.e. each time Louvain is run on the same Graph, the results may vary slightly. In the rest of this paper all the results concerning the Louvain method on a given Graph are the result of a single run on this Graph.

Infomap.
The Infomap method is a heuristic for tentatively maximizing the quality function described in 2008 by Rosvall and Bergstrom [42]. This quality function is based on the minimum description length principle [43]. It consists in measuring the compression ratio that a given partitioning Γ provides for describing the trajectory of a Random walk on a Graph. The trajectory description happens on two levels. When the walker enters a Module, we write down its name. We then write the vertices that the walker visits, with a notation local to the Module, so that an identical short name may be used for different vertices from different Modules. A concise description of the trajectory, with a good compression ratio, is therefore possible when the Modules of Γ are such that the walker tends to stay in them, which corresponds to the idea that the walker is trapped when it enters a good Module, which is supposed to be a overconnected region that is only weakly connected to other Modules.
For Infomap, a good Partitional Clustering Γ is then one that groups in same Module vertices allowing a good compression ratio for describing the trajectory of a Random walker on G.
However, as we will see in section 7, Infomap only identifies a single Module when the overconnected regions are only sligthly pronounced.
The Infomap method https://github.com/mapequation/ is non-deterministic, in the rest of this paper all the results concerning the Infomap method on a given Graph are the result of a single run on this Graph.

Confluence, a vertices mesoscopic closeness measure
The definition of Confluence proposed in this section is an adaptation of these proposed in [44] to compare the structures of two Terrain-Graphs.
In Eq 5, with regards to a Graph G: is a local (microscopic) vertices closeness measure relative to G; 2m is a global (macroscopic) vertices closeness measure relative to G.
To avoid the resolution limits of Modularity described in [35][36][37][38][39], we introduce here Confluence(G, i, j), an intermediate mesoscopic vertices closeness measure relative to a Graph G, that we define below.
If G = (V, E) is a reflexive and undirected Graph, let us imagine a walker wandering on the Graph G: at time t 2 N, the walker is on one vertex i 2 V; at time t + 1, the walker can reach any neighbouring vertex of i, with a uniform probability. This process is called a simple Random walk [45]. It can be defined by a Markov chain on V with an n × n transition Matrix [G]: Since G is reflexive, each vertex has at least one neighbour (itself) and [G] is therefore well defined. Furthermore, by construction, [G] is a stochastic Matrix: 8i 2 V, ∑ j2V g i,j = 1. The probability P t G ði⇝jÞ of a walker starting on vertex i and reaching vertex j after t steps is: Proposition 1 Let G = (V, E) be a reflexive Graph with m edges, and G null = (V, E null ) its null model such that the probability of the existence of a link between two vertices i and j is Proof by induction on t: (a) True for t = 1: If true for t then true for t + 1: On a Graph G = (V, E) the trajectory of a Random walker is completely governed by the topology of the Graph in the vicinity of the starting node: after t steps, any vertex j located at a distance of t links or less can be reached. The probability of this event depends on the number of paths between i and j, and on the structure of the Graph around the intermediary vertices along those paths. The more short paths exist between vertices i and j, the higher the probability P t G ði⇝jÞ of reaching j from i. On the Graph G null the trajectory of a Random walker is only governed by the degrees of the vertices i and j, and no longer by the topology of the Graph in the vicinity of these to nodes.
We want to consider as "close" each pair of vertices {i, j} having a probability of reaching j from i after a short Random walk in G, greater than the probability of reaching j from i in G null . We therefore define the t-confluence Conf t (G, i, j) between two vertices i, j on a Graph G as follows: Proposition 2 Let G = (V, E) be a reflexive Graph with m edges, and G null its null model such that the probability of the existence of a link between two vertices i and j is e Proof: the result follows directly from definition 10: To prove that Conf t (G, �, �) is symmetric, we first need to prove proposition 3.
Proof by induction on t: (a) True for t = 1: If true for t then true for t + 1: Proof: If i ¼ j : it follows directly from definition 10: Most Terrain-Graphs exhibit the properties p 2 (short paths) and p 3 (high Clustering rate). With a classic distance such as the shortest path between two vertices, all vertices would be close to each other in a Terrain-Graph (because of property p 2 ). On the contrary, Confluence allows us to identify vertices living in a same overconnected region of G (property p 3 ): If i, j are in a same overconnected region: If i, j are in two distinct overconnected regions: Where the notion of region varies according to t: À 1 otherwise: Confluence is a microscopic vertices closeness measure relative to G. The notion of region in this case has a radius = 1, it is the notion of neighborhood. Confluence is then independent of the intermediate structures between the two vertices i and j in G; , Confluence is a mesoscopic vertices closeness measure relative to G. The notion of region in this case has a 1 < radius = t < 1, it is no longer a local notion as the notion of neighborhood. Confluence is then sensitive to the tintermediate structures (t-mesoscopicity) between the two vertices i and j in G (see 14 and 15); • When t ! 1: lim t!1 Conf t (G, i, j) = 0, and Confluence is no longer sensitive to any structure in G. (lim t!1 Conf t (G, i, j) = 0 because we can prove with the Perron-Frobenius theorem [46] that if G is reflexive and strongly connected, then the Matrix [G] is ergodic [47], then lim t!1 P t G ði⇝jÞ ¼ d G ðjÞ 2m . So by definition 10 and proposition 1: Confluence actually defines an infinity of mesoscopic vertex closeness measures, one for each Random walk of length 1 < t < 1. For clarity, in the rest of this paper, we set t = 3 and define Conf(G, i, j) = Conf 3 (G, i, j).

Using a mesoscopic scale with Confluence for a new Clustering quality function
We propose here Q t Conf , a new Clustering quality function, which introduces a mesoscopic scale through Confluence with a resolution parameter τ 2 [0, 1] to promote density of the Modules: In Eq 16, with regard to a Graph G, the term Therefore in Eq 18, Q t Conf ðG; GÞ gives a weight of τ to the microscopic and macroscopic structure of Γ with regards to the Graph G and a weight of (1 − τ) to the mesoscopic structure. The closer the τ 2 [0, 1] parameter is to 1, the less Confluence is taken into account.

Optimality
A Partitional Clustering Δ is optimal for a quality function Q iff for all partitioning Γ of V, Q (G, Δ)) ≧ Q(G, Γ)). Computing a Δ that maximizes Q P edge ðG; DÞ is N P À complete [48], and the same holds for computing a Clustering that maximizes Q t Conf . However, when the number of vertices of a Graph G = (V, E) is small, the problem of maximizing the modularity can be turned into a reasonably tractable Integer Linear Program (see [48]): We define n 2 decision variables X ij 2 {0, 1}, one for each pair of vertices {i, j} 2 V. The key idea is that we can build an equivalence relation on V (i * j iff X ij = 1) and therefore a partitioning of V. To guarantee that the decision variables give rise to an equivalence relation, they must satisfy the following constraints: Reflexivity: 8i 2 V, X ii = 1; 8i; j; k 2 V : X jk þ X ik À 2:X ij � 1: With the following objective functions to maximize: The method SGC described in Algorithm 1 do not optimize a quality function, and the quality function used by Infomap can not be expressed as an R-valued symmetric similarity measure between vertices of G. We therefore left out this functions in our study of optimality, not having the ability to define their corresponding objective function to maximize in a similar fashion to what was done for Q P edge and Q t Conf with the formulas 19 and 20. In Fig 1, on a small artificial Graph G 1 toy , we compare the optimal Clusterings D where: • Δ

Binary edge-classifier by nodes blocks
What metric to use to estimate the accuracy of the four Clusterings in Fig 1? Much literature addresses this fundamental question [49][50][51]. Here we propose the definition of Binary Edge-Classifier By nodes Blocks (BECBB). To measure the quality of a Clustering Γ on a Graph G = (V, E), an intuitive, simple and efficient approach is to consider a Clustering Γ (with or witout overlaps), as a BECBB trying to predict the edges of a Graph: classifying each pairs of vertices into two classes, the PositiveEdge and the NegativeEdge.
Definition: A BECBB is a pairs of nodes binary classifier trying to predict the edges of a Graph. It is not allowed to give two complementary sets of pairs of nodes, one for its predictions as PositiveEdge and its complementary set for its predictions as NegativeEdge, but is forced to provide its predictions in the form of nodes blocks B i � V: classifying as PositiveEdge a pair {x, y} if 9i such x, y 2 B i else classifying it as NegativeEdge. If blocks are allowed to overlap then it is a BECBB OV else it is a BECBB NO .
Let Γ a Clustering (with or witout overlaps) of a Graph G = (V, E) PairsðGÞ  We can then measure the Γ's accuracy with the classical measures in diagnostic binary Classification [52,53]: We can use these three measures indifferently on Clusterings with or without overlaps, because the Eq 21 makes sense with Clusterings with or without overlaps.

Properties
As showed in [51], it is better that a metric σ(Γ), to estimate the accuracy of a Clustering Γ, has the Homogeneity and Completeness [50] properties (see Fig 2 inspired by Figs 1 and 3 in [51]).
It is clear that the metric FscoreðPairsðGÞ; EÞ has these two properties, for any Clustering Γ with or without overlaps. Moreover the metric FscoreðPairsðGÞ; EÞ is independent of any extrinsic expectation to the Graph, we only need to trust the Graph itself. It is a good objective way to evaluate and compare Clusterings. So, to estimate the accuracy of Clustering methods Method i and compare them on a Graph G = (V, E), we will use the three metrics:

Precision(Method i (G = (V, E)), E):
Measuring the ability of the Method i not to include nonedges in the Modules it returns;

Fscore(Method i (G = (V, E)), E):
Measuring the harmonic mean of its Precision and Recall. 6 Starling, a heuristic for maximizing Q t

Conf
In this section we describe Starling, a heuristic for tentatively maximizing Q t Conf . Confluence gives us an ordering on the edges of the Graph G = (V, E), in particular, sorting the edges {i, j} 2 E by descending Confluence, forms the basis of a new Module merging strategy, described in Algorithm 2, intended to optimize Q t Conf .

Algorithm 2 Starling: Graph Partitional Clustering
Different edges {i 1 , j 1 } 2 E and {i 2 , j 2 } 2 E might happen to have the exact same Confluence value (Conf(G, i 1 , j 1 ) = Conf(G, i 2 , j 2 )), making the process (in Line 1) non-deterministic in general, because of its sensitivity on the order in which the edges with identical Confluence values are processed. A simple solution to this problem is to sort edges by first comparing their Confluence values and then using the lexicographic order on the words i 1 j 1 and i 2 j 2 when Confluence values are strictly identical.
We coded this Algorithm in C ++ and in the following we used this program to analyze Starling's results. With G 1 toy , Starlingðt; G 1 toy Þ find the optimal Clusterings for Q t Conf : Starlingð0:00; G 1 toy Þ ¼ D

Performance
In this section we estimate the accuracy of Starling and compare it with the methods Louvain, Infomap and SGC. We can Estimate the accuracy of Clustering Algorithms on:

Real Graphs: A set of Terrain-Graphs built from real data;
A Benchmark B : A set of computer-generated Graphs and its gold standard G B its expected Modules as expected overconnected regions.
Because we do not need to know κ the number of vertex groups in advance in the input of Louvain and Infomap, whereas we need it with SGC, for greater clarity, we compare on the one hand Starling versus Louvain, and Infomap, and on the other hand Starling versus SGC.

Performance on Real Terrain-Graphs.
In this section we estimate the accuracy of Algorithms with three Terrain-Graphs: • G Email : The Graph was generated using email data from a large European research institution [54,55]. The Graph contains an undirected edge {i, j} if person i sent person j at least one email https://snap.stanford.edu/data/email-Eu-core.html.
• G DBLP : The DBLP computer science bibliography provides a comprehensive list of research papers in computer science [56]. Two authors are connected if they have published at least one paper together https://snap.stanford.edu/data/com-DBLP.html.
• G Amazon : A Graph was collected by crawling the Amazon website. It is based on the Customers Who Bought This Item Also Bought feature of the Amazon website [56]. If a product i is frequently co-purchased with product j, the Graph contains an undirected edge {i, j} https:// snap.stanford.edu/data/com-Amazon.html. Table 1 illustrates the pedigrees of these Terrain-Graphs and Table 2 shows the accuracies of Louvain, Infomap and Starling Considering each Clustering as a BECBB. We show also the number of Modules, the Length of the biggest Module and the computation time in seconds (All times are based on computations with a Quad Core Intel i5 and 32 Go RAM).
• Louvain: This is the fastest method, however its Precision is small, producing very few Modules, one of which is very large; • Infomap: It gets a good Fscore, higher than this of Louvain.
• Starling τ : 9τ 2 [0, 1] such that Starling(G, τ) gets the highest Fscore. By default τ = 0.25 is a good compromise to obtain at the same time a good Precision and a good Recall. If we want to promote Recall (more edges in Modules) then we can decrease τ, and if we want to promote Precision (less non-edges in Modules) then we can increase τ.

Performance on Benchmark ER .
Benchmark ER is the class of Random Graphs studied by Erdös and Rényi [57,58] with parameters N the number of vertices and p the connection probability between two vertices. Random Graphs do not have a meaningful group structure, and they can be used to test if the Algorithms are able to recognize the absence of Modules. Therefore, we set N = 128, and we will study the accuracy of the methods with Benchmark ER according to p.    . Fig 4 shows the accuracy of the methods according to p considering each Clustering as a BECBB. We can see that: • Oracle ER knows Γ ER , but does not know the concretely constructed edges E G ER . Its number of Modules is always = 1. Its Precision increases when p increases, because density increases. Its Recall is always = 1. Its Fscore increase;  The phenomenon of overconnected regions is particulary clear in Terrain-Graphs, but also occur in Erdős-Rényi Random Graphs. Indeed such Graphs are not completely uniform, they present an embryo of structure with slightly-overconnected regions resulting from Random fluctuations (for exemple the Module δ 4 which is clearly overconnected in this Graph).
It is these slightly-overconnected regions present in Random Graphs that are exploited and amplified in [59] to transform a Random Graph into a shaped-like Terrain-Graph and that Starling detects in a Random Graph, and so accepts as Modules (especially if τ increases). This is why in the (ii) Starling τ returns Modules which have a density greater than the one of the entire Graph, the slightly-overconnected regions (especially if τ increases). Which means: Starling τ identifies the presence of weak structures.

Performance on Benchmark LFR .
In most Terrain-Graphs, the distribution of degrees is well approximated by a power law. Similarly, in most Terrain-Graphs, the distribution of community sizes is well approximated by a power law [40,60]. Therefore, in order to produce artificial Graphs with a meaningful group structure similar to most Terrain-Graphs, Lancichinetti, Fortunato and Radicchi proposed Benchmark LFR [61] (Code to generate Benchmark LFR Graphs can be downloaded from Andrea Lancichinetti's homepage https://sites. google.com/site/andrealancichinetti/home). The Graphs in Benchmark LFR are parameterized with: • N their number of vertices; • k their average degree; • γ the power law exponent of their degree distribution; • β the power law exponent of their community sizes distribution; • μ 2 [0, 1] their mixing parameter: Each vertex shares a fraction 1 − μ of its links with the other vertices of its community and a fraction μ with the other vertices of the Graph.
With Benchmark LFR , when the mixing parameter μ is weak, the overconnected regions are well separated from each other, and when μ increases, the overconnected regions are less clear. Therefore, we set N = 1000, and k = 15 or k = 25, and (γ = 2, β = 1) or (γ = 2, β = 2) or (γ = 3, β = 1) and for each of these six configurations, we will study the accuracy of the methods according to μ.
Let G LFR ¼ ðV G LFR ; E G LFR Þ a Graph built by Benchmark LFR , G G LFR its expected Modules as expected overconnected regions, and Oracle LFR ðG LFR Þ ¼ G G LFR the Oracle's method which knows the G G LFR of each G LFR .
We show in Figs 5 and 6 the accuracy of the methods according to μ, considering each Clustering as a BECBB. We can see that: • Oracle LFR knows the G G LFR of each G LFR , but does not know their concretely constructed edges E G LFR . Its number of Modules is always jG G LFR j. Its Precision decreases when μ increase, because there are more and more non-edges in the expected Modules, but Oracle LFR does not know it. Its Recall decreases when μ increase, because there are more and more edges outside the expected Modules, but Oracle LFR does not know it. Its Fscore decreases when μ increase, because its Precision and its Recall decreases; • The best Precisions are done with Starling τ=0. 25 , but with a lot of Modules when the overconnected regions are less clear (because here again (see section 7.1.2.2) Starling identifies the presence of the large number of small slightly-overconnected regions as Modules present in these Graphs); • The best Recalls are done with Infomap, but with very few Modules, and often only one, when the overconnected regions are less clear (because there is no way to compress the description of the path of a Random walker in these Graphs); • The best Fscores are done with Infomap and Starling τ=0.25 except when the overconnected regions are less clear, then it is with Starling τ=0.25 .

Performance on Real Terrain-Graphs.
In this section, we compare Starling(G, τ) with respect to SGC(G, κ), κ varying, on three little Terrain-Graphs: • G Email : The Graph seen in section 7.1.1; • G dblp 811 : The subGraph of G dblp on the vertices of the larger Module of Infomap(G dblp ) which has 811 vertices; • G amazon 380 : The subGraph of G amazon on the vertices of the larger Module of Infomap(G amazon ) which has 380 vertices; Table 3 illustrates the pedigrees of these Terrain-Graphs. The dataset describing G Email contains "ground-truth" community memberships of the nodes C : V G Email ! D. Each individual belongs to exactly one of 42 departments D = {d 1 , . . .d 42 } at the research institute from which the emails are extracted. Let Γ Dep the Gold-Standart partition of V G Email such: We can therefore evaluate the quality of a Clustering by partition on G Email according to two kinds of truths:    Fig 7(a), and on the other hand according to the Extrinsic-Truth PairsðG Dep Þ ¼ S g2G Dep P g 2 in Fig 7(b). We can see that: • According to the Intrinsic-Truth E G Email in Fig 7(a)  That is to say that Gold-Standards are not always the best BECBBs, we can not always trust Gold-Standards provided by Benchmarks or built using human assessors, which as showed in [62], generaly do not always agree with each other, even when their judgements are based on the same protocol.
In our present example with G Email , we can think that two individuals from the same department can communicate in real life more often than two individuals from different departments: Two individuals from the same department do not necessarily need to communicate more by email than two individuals from different departments.

Performance on Benchmark ER .
Because we need to know κ the number of groups of vertices in advance in the Input of SGC, to be able to compare Starling with SGC we define: SGC τ (G) = SGC(G, κ = |Starling(G, τ)|).
Let G ER ¼ ðV G ER ; E G ER Þ a Random Graph built by Benchmark ER , Γ ER = {V} with only one Module, and Oracle ER (G ER ) = Γ ER = {V} the Oracle's method who knows Γ ER .

Choosing the τ parameter of Starling
When using a Benchmark B to evaluate the performance of methods on a Graph G B ¼ ðV G B ; E G B Þ, the Oracle's method Oracle B knows the expected overconnected regions G G B but do not knows the concretely constructed edges E G B . Therefore, when the overconnected regions are less clear, as BECBB (with Gold ¼ E G B , Intrinsic-Truth), some methods may outperform the Oracle B method. This happens especially with the Starling τ method if the τ parameter has been chosen appropriately.
We have seen in Formula 17 that the closer the τ 2 [0, 1] parameter is to 1, the less Confluence is taken into account in Q t Conf . With Terrain-Graphs, we propose using τ = 0.25 as a first

Length of Random walks
For clarity and simplicity, we restricted the Random walks of P t G ði⇝jÞ to a length of t = 3. A first study of the impact of the length of those Random walks to transform a Random Graph into a shaped-like Terrain − Graph was done in [59], but a deeper one should be carried to understand how the length influences the mesoscopicity of Confluence and its effect on Q Conf and Starling.
For example we can build the Graph G 2$ toy from G 2 toy by inserting a new vertex in the middle of each edge. Fig 12 illustrates the optimal Clusterings on G 2 toy and on G 2$ toy for Q 0:0 Conf with t = 3 and also with t = 6, allowing us to see that: On G 2$ toy with t = 6: On G 2 toy with t = 3: The length of Random walks t could be advantageously chosen taking into account L, the average number of edges on the shortest path between two vertices.

Directed graphs
If G is a positively weighted Graph by W = {w i,j such {i, j} 2 E}, then we can apply Q Conf and Starling by replacing Eqs 7 and 10 by 27 and 28 respectively: Conf t ðG; i; jÞ ¼ If G is a directed Graph, one can also consider using a variant of page rank [63][64][65] in place of Eq 8.

Conclusions and perspectives
In this paper, we defined Confluence, a mesoscopic vertex closeness measure based on short Random walks, which brings together vertices from the same overconnected region, and separates vertices coming from two distinct overconnected regions. Then we used Confluence to define Q t Conf , a new Clustering quality function, where the τ 2 [0, 1] parameter is a handle on the Precision & Recall, the size and the number of Modules. With a small toy Graphs, we showed that optimal Clusterings for Q t Conf improve the Fscore of the optimal Clusterings for Modularity.
We then introduced Starling(G, τ), a new heuristic based on the Confluence of edges designed to optimize Q t Conf on a Graph G. On the same little toy Graph, we showed that Starling(G, τ) finds an optimal Clustering for Q t Conf . Comparing Starling(G, τ) to SGC(G, κ), Infomap, and Louvain we show that: • Performance with the Terrain-Graphs studied in this paper: (2) Often (τ dependent) Starling(G, τ), thanks to its (ii) behavior, is able to get larger Fscores than these of Oracles that would only knows their expected overconnected regions (concretely slightly-overconnected), ignoring E their concretely constructed edges. SGC(G, κ = | Starling(G, τ = 0.25)|) can also succeed (see Fig 10(a) and 10(c)), but still weaker than Starling(G, τ = 0.25), whereas Infomap can never succeed, because its (i) behavior.
To sum up: If we know the good number of groups of vertices κ in advance then we can use SGC. If we do not know it, then we can use Infomap on the one hand with Starling on the other hand wich are complementary: Infomap tend to favor Recall with good Fscore and is able to identify the absence of strong structures; Starling τ=0.25 by default tends to favor Precision with good Fscore and is able to identify the presence of weak structures. Then if we want to promote Recall with a smaller number of larger Modules, we can decrease τ, and if we want to promote Precision with a greater number of smaller Modules, we can increase τ.
Our follow-up work: We will focus on the role on the ouputs of Starling, played by the length of the Random walks in computing Confluence, as well as the development of a Clustering method based on Confluence able to detect Clustering in Graphs accounting for edge directions and edge weights, its returns communities possibly overlapping.